A concrete syntax tree or parse tree or parsing tree[1] is an ordered, rooted tree that represents the syntactic structure of a string according to some formal grammar. Parse trees are usually constructed according to one of two competing relations, either in terms of the constituency relation of constituency grammars (= phrase structure grammars) or in terms of the dependency relation of dependency grammars. Parse trees are distinct from abstract syntax trees (also known simply as syntax trees), in that their structure and elements more concretely reflect the syntax of the input language. Parse trees may be generated for sentences in natural languages (see natural language processing), as well as during processing of computer languages, such as programming languages.
Contents |
The constituency-based parse trees of constituency grammars (= phrase structure grammars) parse trees distinguish between terminal and non-terminal nodes. The interior nodes are labeled by non-terminal categories of the grammar, while the leaf nodes are labeled by terminal categories. The image below represents a constituency-based parse tree; it shows the syntactic structure of the English sentence John hit the ball:
This parse tree is simplified; for more information, see X-bar theory. The parse tree is the entire structure, starting from S and ending in each of the leaf nodes (John, hit, the, ball). The following abbreviations are used in the tree:
Each node in the tree is either a root node, a branch node, or a leaf node. S is the root node, NP and VP are branch nodes, and John, hit, the, and ball are all leaf nodes. The leaves are the lexical tokens of the sentence.[2] A node can also be referred to as parent node or a child node. A parent node is one that has at least one other node linked by a branch under it. In the example, S is a parent of both NP and VP. A child node is one that has at least one node directly above it to which it is linked by a branch of the tree. From the example, hit is a child node of V. The terms mother and daughter are also sometimes used for this relationship.
The dependency-based parse trees of dependency grammars[3] see all nodes as terminal, which means they do not acknowledge the distinction between terminal and non-terminal categories. They are simpler on average than constituency-based parse trees because they contain many fewer nodes. The dependency-based parse tree for the example sentence above is as follows:
This parse tree lacks the phrasal categories (S, VP, and NP) seen in the constituency-based counterpart above. Like the constituency-based tree however, constituent structure is acknowledged. Any complete subtree of the tree is a constituent. Thus this dependency-based parse tree acknowledges the subject noun John and the object noun phrase the ball as constituents just like the constituency-based parse tree does.
The constituency vs. dependency distinction is far-reaching. Whether the additional syntactic structure associated with constituency-based parse trees is necessary or beneficial is a matter of debate.